Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ROX-12921: initialize counter metrics with 0 value #720

Merged
merged 4 commits into from
Jan 16, 2023

Conversation

ivan-degtiarenko
Copy link
Contributor

@ivan-degtiarenko ivan-degtiarenko commented Jan 11, 2023

Description

The focus of this PR is to introduce a 0 value as a default value for "slow" counter metrics.
"Slow" here means that the counter metric is not actively updating and can take a long time before being incremented for the first time.
The reason to do that is that to correctly use the Prometheus rate function in the alert, a zero value should be defined. rate function is used to determine when the counter metric goes up.

This PR does not solve the described issue for the central timeout metric. The reason for that is that the solution within this PR works only for metrics without dynamically changing labels. The issue for the timeout metric will be fixed on the alert side.

Checklist (Definition of Done)

  • Unit and integration tests added
  • Added test description under Test manual
  • Evaluated and added CHANGELOG.md entry if required
  • Documentation added if necessary (i.e. changes to dev setup, test execution, ...)
  • CI and all relevant tests are passing
  • Add the ticket number to the PR title if available, i.e. ROX-12345: ...
  • Discussed security and business related topics privately. Will move any security and business related topics that arise to private communication channel.

Test manual

  1. Deploy local fleet-manager
  2. Open http://localhost:8080/metrics and see worker counter metrics and operations counter metrics:
# HELP acs_fleet_manager_reconciler_failure_count count of failed operations of the backgroup reconcilers
# TYPE acs_fleet_manager_reconciler_failure_count counter
acs_fleet_manager_reconciler_failure_count{worker_type="accepted_dinosaur"} 0
acs_fleet_manager_reconciler_failure_count{worker_type="central_auth_config"} 0
acs_fleet_manager_reconciler_failure_count{worker_type="deleting_dinosaur"} 0
acs_fleet_manager_reconciler_failure_count{worker_type="dinosaur_dns"} 0
acs_fleet_manager_reconciler_failure_count{worker_type="preparing_dinosaur"} 0
acs_fleet_manager_reconciler_failure_count{worker_type="provisioning_dinosaur"} 0
acs_fleet_manager_reconciler_failure_count{worker_type="ready_dinosaur"} 0

and

# HELP acs_fleet_manager_central_operations_success_count number of successful central operations
# TYPE acs_fleet_manager_central_operations_success_count counter
acs_fleet_manager_central_operations_success_count{operation="create"} 0
acs_fleet_manager_central_operations_success_count{operation="delete"} 0
# HELP acs_fleet_manager_central_operations_total_count number of total central operations
# TYPE acs_fleet_manager_central_operations_total_count counter
acs_fleet_manager_central_operations_total_count{operation="create"} 0
acs_fleet_manager_central_operations_total_count{operation="delete"} 0

@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 11, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ivan-degtiarenko ivan-degtiarenko temporarily deployed to development January 11, 2023 19:30 — with GitHub Actions Inactive
pkg/metrics/metrics.go Outdated Show resolved Hide resolved
@ivan-degtiarenko ivan-degtiarenko temporarily deployed to development January 12, 2023 17:26 — with GitHub Actions Inactive
@ivan-degtiarenko ivan-degtiarenko force-pushed the ROX-12921/initialize-prometheus-metrics-with-0 branch from e530bb3 to a7c3046 Compare January 12, 2023 17:28
@ivan-degtiarenko ivan-degtiarenko temporarily deployed to development January 12, 2023 17:35 — with GitHub Actions Inactive
@ivan-degtiarenko ivan-degtiarenko marked this pull request as ready for review January 12, 2023 17:56
@ivan-degtiarenko ivan-degtiarenko temporarily deployed to development January 12, 2023 17:56 — with GitHub Actions Inactive
@ivan-degtiarenko
Copy link
Contributor Author

/retest

// We do not initialize observatorium request count metric as it is unused.
// We do not initialize database request count metric as it grows quickly and would not suffer from being undefined.
// We initialize reconciler metrics in InitReconcilerMetricsForType.
func InitOperationMetricsWithZero() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can leave out the operations metrics completely for the same reason as the database request count. Then we don't need the comment, which seems like it could go stale very fast - e.g. once we remove the Observatorium code, or decide to use it after all. So I would propose to just do the minimum here, and only init the reconciler metrics.

@ivan-degtiarenko ivan-degtiarenko temporarily deployed to development January 13, 2023 11:36 — with GitHub Actions Inactive
@openshift-ci
Copy link
Contributor

openshift-ci bot commented Jan 13, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: ivan-degtiarenko, stehessel

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ivan-degtiarenko ivan-degtiarenko merged commit 3d62108 into main Jan 16, 2023
@ivan-degtiarenko ivan-degtiarenko deleted the ROX-12921/initialize-prometheus-metrics-with-0 branch January 16, 2023 09:54
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants